Analysis for Car_Sales_Data¶

in this project , iwill work on car sales data which has some sales properties

Car_Sales_Data field Description¶


  • Below is a description of column fields in the dataset:

Manufacturer: The brand or company that made the car (e.g., Ford, Toyota, VW, Porsche).

Model: The specific model name of the car (e.g., Fiesta, Golf, Prius).

Engine size: The size of the engine in liters (e.g., 1.6 = 1.6-liter engine).

Fuel type: The type of fuel the car

Year of manufacture: The year the car was built.

Mileage: The total distance the car has driven, measured in kilometers.

Price: The current price of the car

Question to be Answered depending an Analysis¶

What is the relationship between engine size and car price?

What is the relationship between car model and price?

What is the relationship between engine size and mileage?

What is the relationship between year of manufacture and car price?

What is the relationship between car model and engine size?

In [1]:
## load nedeed Modules 
import pandas as pd 
In [2]:
## display all data columns 
pd.options.display.max_columns=None 
In [3]:
## load the dataset into DataFrame 
df=pd.read_csv(r"C:\Users\DR SYSTEM\Downloads\car_sales_data.csv")
In [4]:
## display first rows 
df.head(2)
Out[4]:
Manufacturer Model Engine size Fuel type Year of manufacture Mileage Price
0 Ford Fiesta 1.0 Petrol 2002 127300 3074
1 Porsche 718 Cayman 4.0 Petrol 2016 57850 49704
In [5]:
##check for DataFrame shape 
df.shape
Out[5]:
(50000, 7)
  • we found that the data has around 5k row with 7 column
In [6]:
## check for data info (quality)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 7 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Manufacturer         50000 non-null  object 
 1   Model                50000 non-null  object 
 2   Engine size          50000 non-null  float64
 3   Fuel type            50000 non-null  object 
 4   Year of manufacture  50000 non-null  int64  
 5   Mileage              50000 non-null  int64  
 6   Price                50000 non-null  int64  
dtypes: float64(1), int64(3), object(3)
memory usage: 2.7+ MB
In [7]:
#list all data column 
df.columns
Out[7]:
Index(['Manufacturer', 'Model', 'Engine size', 'Fuel type',
       'Year of manufacture', 'Mileage', 'Price'],
      dtype='object')

Feature Engineering¶

  • add High Mileage Flag coulmn
  • add car age column
  • add price per km column
In [8]:
#copy the dataframe 
df_copy=df.copy()
In [9]:
#check for duplicates 
df.duplicated().sum()
Out[9]:
12
  • there is 12 duplicated row
In [10]:
#drop duplicates 
df.drop_duplicates(inplace=True)
In [11]:
#check
df.duplicated().sum()
Out[11]:
0
In [12]:
#check for null values
df.isnull().sum()
Out[12]:
Manufacturer           0
Model                  0
Engine size            0
Fuel type              0
Year of manufacture    0
Mileage                0
Price                  0
dtype: int64
  • not null values
In [13]:
#check for data size 
df.shape
Out[13]:
(49988, 7)
In [14]:
#add coulmns High Mileage Flag 
df['High Mileage Flag ']=df['Mileage'].apply(lambda x : "high" if x >1500 else "low")
In [15]:
#add car age columns
# load nedeed Modules 
from datetime import datetime
current_year=datetime.now().year 
df['car_age']=current_year-df['Year of manufacture']
print(df[['Model','Year of manufacture','car_age']].head())
        Model  Year of manufacture  car_age
0      Fiesta                 2002       23
1  718 Cayman                 2016        9
2      Mondeo                 2014       11
3        RAV4                 1988       37
4        Polo                 2006       19
In [16]:
#add price per km column
df['price per km']=df['Price'] / df['Mileage']
In [17]:
#check DataFrame 
df.head(1)
Out[17]:
Manufacturer Model Engine size Fuel type Year of manufacture Mileage Price High Mileage Flag car_age price per km
0 Ford Fiesta 1.0 Petrol 2002 127300 3074 high 23 0.024148
In [33]:
df.describe()
Out[33]:
Engine size Year of manufacture Mileage Price car_age price per km
count 49988.000000 49988.000000 49988.000000 49988.000000 49988.000000 49988.000000
mean 1.773140 2004.209630 112515.561215 13829.112387 20.790370 0.492719
std 0.734149 9.646056 71624.341062 16417.812203 9.646056 1.844404
min 1.000000 1984.000000 630.000000 76.000000 3.000000 0.000180
25% 1.400000 1996.000000 54375.250000 3059.750000 13.000000 0.020147
50% 1.600000 2004.000000 101011.500000 7971.000000 21.000000 0.081169
75% 2.000000 2012.000000 158617.250000 19028.500000 29.000000 0.349427
max 5.000000 2022.000000 453537.000000 168081.000000 41.000000 113.993976

Q1:What is the relationship between engine size and car price?¶

In [18]:
#LOAD NEDEED MODULES 
import plotly.express as px
In [19]:
px.scatter(df,x='Engine size',y='Price',trendline='ols')

As engine size increases, the car price tends to increase.

Q2:What is the relationship between car model and price?¶

In [20]:
#Load nedeed Modules 
import seaborn as sns 
import matplotlib.pyplot as plt
In [21]:
sns.scatterplot(data=df,x='Price',y='Model')
plt.title('Model vs Price')
plt.xlabel('Price')
plt.ylabel('Model')
plt.show()
No description has been provided for this image

Luxury and sports models like Porsche 911, M5, and Cayenne tend to have much higher prices compared to other models like Fiesta, Yaris, or Polo.

Q3:What is the relationship between engine size and mileage?¶

In [22]:
##load nedeed Modules 
import seaborn as sns 
import matplotlib.pyplot as plt
import pandas as pd
In [23]:
columns=['Engine size','Mileage']
correlation=df[columns].corr()
print(correlation)

sns.heatmap(correlation,annot=True)
plt.title('the relationship between engine size and mileage')
plt.show()
             Engine size   Mileage
Engine size     1.000000  0.004365
Mileage         0.004365  1.000000
No description has been provided for this image

There is almost no correlation between engine size and mileage

Q4:What is the relationship between year of manufacture and car price?¶

In [24]:
#LOAD NEDEED MODULES 
import plotly.express as px
In [25]:
px.scatter(df,x= 'Year of manufacture',y='Price',trendline='ols')

As engine size increases, the car price tends to increase.

Q5:What is the relationship between car model and engine size?¶

In [30]:
#LOAD NEDEED MODULES 
import seaborn as sns
In [32]:
sns.barplot(data=df,x='Engine size' , y='Model')
plt.title('relationship between car model and engine size')
plt.xlabel('Engine size')
plt.ylabel('Model')
plt.show()
No description has been provided for this image

colclusion¶

  • we found that the data has around 5k row with 7 column
  • As engine size increases, the car price tends to increase
  • Luxury and sports models like Porsche 911, M5, and Cayenne tend to have much higher prices compared to other models like Fiesta, Yaris, or Polo.
  • There is almost no correlation between engine size and mileage
  • As engine size increases, the car price tends to increase.
In [ ]:
 
In [ ]:
 
In [ ]: